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Abstract — This paper examines the value of storage in secur- 
ing reliability of a system with uncertain supply and demand, 
and supply friction. The storage is frictionless as a supply 
source, but once used, it cannot be filled up instantaneously. 
The focus application is a power supply network in which 
the base supply and demand are assumed to match perfectly, 
while deviations from the base are modeled as random shocks 
with stochastic arrivals. Due to friction, the random surge 
shocks cannot be tracked by the main supply sources. Storage, 
when available, can be used to compensate, fully or partially, 
for the surge in demand or loss of supply. The problem of 
optimal utilization of storage with the objective of maximizing 
system reliability is formulated as minimization of the expected 
discounted cost of blackouts over an infinite horizon. It is shown 
that when the stage cost is linear in the size of the blackout, 
the optimal policy is myopic in the sense that all shocks are 
compensated by storage up to the available level of storage. 
However, when the stage cost is strictly convex, it may be 
optimal to curtail some of the demand and allow a small current 
blackout in the interest of maintaining a higher level of reserve 
to avoid a large blackout in the future. The value of storage 
capacity in improving system's reliability, as well as the effects 
of the associated optimal policies under different stage costs on 
the probability distribution of blackouts are examined. 

Index Terms — Storage, Ramp Constraints, Reliability, Prob- 
ability of Large Blackouts 

I. Introduction 

Supply and demand in electric power networks are subject 
to exogenous, impulsive, and unpredictable shocks due to 
generator outages, failure of transmission equipments or 
unexpected changes in weather conditions. On the other 
hand, environmental causes along with price pressure have 
led to a global trend in large-scale integration of renewable 
resources with stochastic output. This is likely to increase the 
magnitude and frequency of impulsive shocks to the supply 
side of the network. We ask, what is the value of storage in 
mitigating volatility of supply and demand, and what are the 
fundamental limits that cannot be overcome by storage due to 
physical ramp constraints, and finally, what are the impacts of 
different control policies on system reliability, for instance, 
on the expected cost or the probability of large blackouts? 

In this paper our focus is on the reliability value of storage, 
defined as the maximal improvement in system reliability as 
a function of storage capacity. Two metrics for quantifying 
reliability in a system are considered: The first is the expected 
long-term discounted cost of blackouts (cost of blackouts 
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(COB) metric), and the second is the probability of loss of 
load by a certain amount or less. 

We model the system as a supply-demand model that is 
subject to random arrivals of energy deficit shocks, and a stor- 
age of limited capacity, with a ramp constraint on charging, 
but no constraint on discharging. The storage may be used to 
partially or completely mask the shocks to avoid blackouts. 
We formulate the problem of optimal storage management as 
the problem of minimization of the COB metric, and provide 
several characterizations of the optimal cost function. By 
ignoring other factors such as the environment, cost of energy 
or storage, we characterize the value of storage purely from 
a reliability perspective, and examine the effects of physical 
constraints on system reliability. Moreover, for a general 
convex stage cost function, we present various structural 
properties of the optimal policy. 

In particular, we prove that for a linear stage cost, a myopic 
policy which compensates for all shocks regardless of their 
size by draining from storage as much as possible, is optimal. 
However, for nonlinear stage costs where the penalty for 
larger blackouts is significantly higher, the myopic policy 
is not optimal. Intuitively, the optimal policy is inclined to 
mitigate large blackouts at the cost of allowing more frequent 
small blackouts. Our numerical results confirm this intuition. 
We further investigate the value of additional storage under 
different control policies, and for different ranges of system 
parameters. Our results suggest that if the ratio of the average 
rate of deficit shocks to ramp constraints is sufficiently large, 
there is a critical level of storage capacity above which, the 
value of having additional capacity quickly diminishes. When 
this ratio is significantly large, there seems to be another 
critical level for storage size below which, storage capacity 
provides very little value. Finally, we investigate the effect 
of storage size and volatility of the demand/supply process 
on the probability of large blackouts under various policies. 
We observe that for all control policies, there appears to be 
a critical level of storage size, above which the probability 
of suffering large blackouts diminishes quickly. 

Recent works have examined the effects of ramp con- 
straints on the economic value of storage UJ. Herein, our 
focus is on reliability. Prior research on using queueing 
models for characterization of system reliability, particularly 
in power systems, has been reported in |2| and |3|. Similar 
models and concepts exist in the queueing theory literature 
m, 0, perhaps with different application contexts. Despite 
similarities, our model is different than those of Q, O in 
many ways. We assume that the storage capacity (reserve in 
their model) is fixed and find the optimal policy for with- 
drawing from storage (consuming from reserve), as opposed 



to always draining the reserve and optimizing the capacity. 
Another difference is that our model of uncertainty is a 
compound poisson process instead of the brownian motion 
used in 111, (Sj. We show that the myopic policy of always 
draining storage to mask every energy deficit shock is not 
optimal for strictly convex costs, and investigate the effects 
of nonlinear stage costs (strictly convex cost of blackouts) 
on the optimal policy and the statistics of blackouts. 

The organization of this paper is as follows. Section 
|II| presents the elements of the model and the problem 
formulation. Section [Ill| includes the main analytical results. 



Section IV presents the numerical simulations and discus- 
sions. Finally, Section |V| includes the concluding remarks. 

Notation. Throughout the paper, 1a denotes the indicator 
function of a set A. The operator [x\^ = max{0,x} is the 
projection operator onto the nonnegative orthant. 

II. The Model 

We examine an abstract model of system consisting of a 
single consumer, a single fully controllable supplier, a sup- 
plier with stochastic output (e.g., wind), and a storage system 
with finite capacity (Figure [T]). These agents each represent 
an aggregate of several small consumers and producers. The 
details of the model are outlined below. 
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Fig. 1. Layout of the physical layer of a power supply network with 
conventional and renewable generation, storage, and demand. 

A. Supply 

1) Controllable Supply: The controllable supply process 
is denoted byG = {G^:t>0}, where Gt is the power 
output at time t > 0. It is assumed that the supplier's 
production is subject to an upward ramp constraint, in the 
sense that its output cannot increase instantaneously, 

Gt - Gt' 



t-t' 



\/t:{)<t<t'. 



We do not assume a downward ramp constraint or a maxi- 
mum capacity constraint on Gt- Thus, production can shut 
down instantaneously, and can meet any large demand suffi- 
ciently far in the future. 

2) Renewable Supply: The renewable supply process is 
denoted by R = {Rt : t > 0}. It is assumed that 
R can be modeled as a process with two components: 
R = R + AR, where R={i?^:t>0}isa deterministic 
process representing the predicted renewable supply, and 
AR = {ARt : t > 0} is the residual supply assumed to 
be a random arrival process. Thus, at any given time t > 0, 
the total forecast supply from the renewable and controllable 
generators is given by Gt ^ Rt- 



B. Demand 

The demand process is denoted by D = {Dt : t > 0}, 
where Dt is the total power demand at time t, assumed 
to be exogenous and inelastic. Similar to the renewable 
supply, D has two components: D = D + AD, where 
D = {Dt : t > 0} is the predicted demand process 
(deterministic), and AD = {AI^^ : t > 0} is the residual 
demand, again, assumed to be a random arrival process. 

Definition 1. The power imbalance is defined as the residual 
demand minus the residual supply. 



A A - ARt 



The normalized energy imbalance is defined as: 



Wt = 



Pi 
2C 



(1) 



(2) 



C. Storage 

The storage process is denoted by s = {s^ G [0, 5] : t > 
0}, where St is the amount of stored energy at time t, and 
s < 00 is the storage capacity. The storage technology is 
subject to an upward ramp constraint: 



St - St' 
t-t' 



yt:0<t<t\ 



Thus, storage cannot be filled up instantaneously, though, it 
can be drained (to supply power) instantaneously. Let U = 
{[/^ : t > 0}, be the power withdrawal process from storage. 
The dynamics of storage is then given by: 
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It is desired to design a causal controller K such that the 
control law Ut = K{stjGt-\-Rt — Dt) maximizes the system 
reliability objectives. 
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Fig. 2. The control layer of the power supply network in Figure 

D. Reliability Metric 

We refer to the event of not meeting the demand as a 
blackout. The cost of blackouts (COB) metric is defined as 
the expected long-term discounted cost of blackouts: 



/ e-'^h ([P,]+) dr 
Jo 



(4) 
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where P(-) is the power imbalance process, h : 
is an increasing function, and 6 > is the discount rate. 



E. Problem Formulation 



III. Main Results 



In this section we present the problem formulation. Before 
we proceed, we pose the following assumptions. 

Assumption 1. The normalized energy imbalance process 
([2]) is the jump process in a compound poisson process with 
arrival rate Q and jump size distribution fw, where the 
support of fw lies within a bounded interval [0,5]. The 
maximum jump size is thus upperbounded by B. 

Assumption 2. The forecast supply is equal to the forecast 
demand. That is: 

Dt = Gt^Ru Vt>0 

Under Assumption [2j the energy from storage will be used 
only to compensate for the power imbalance, since in the 
absence of an energy shock, supply is equal to demand, and 
storage provides no additional utility. Under Assumptions [T] 
and|2j the dynamics of the storage process can be written as: 



St = So 



hs^<s}rdr 



[\{Sr-,Wr)dNr (5) 
^0 



where Nt is a Poisson process of rate Q, and Wt is the 
jump size (energy imbalance) process, drawn independently 
and identically from a distribution fw Further, /i denotes 
a control policy. We focus on stationary Markov policies 
since the energy imbalance modeled as a compound Poisson 
process is stationary and memory less. We denote the set of 
all such feasible policies by 11. 

We are now ready to state the problem formulation. Let 
C^{s) denote the expected long-term discounted cost of 
blackouts starting from an initial state s and under control 
policy /i. 
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k=l 



S0 = S 



(6) 



where tk is the k-th Poission arrival time, and Wk = Wt^^ 
is size of the k-th jump. Moreover, g : [0,5] ^ R is the 
stage cost as a function of energy imbalance (blackout size). 
In this work, we assume the following assumptions hold. 

Assumption 3. The stage cost function g{-) is bounded, 
strictly increasing and continuously differentiable. Moreover, 
^w[g{W)] > 0, and g{0) =0. 

The system reliability problem can now be formulated as 
an infinite horizon stochastic optimal control problem 



Cu(s) min 



(J) 



where the optimization problem ^ is subject to the state 
dynamics ([5j. A policy /i* G 11 is defined to be optimal if 

ji* G arg min (7^(5). 

The associated value function or optimal cost function is 
denoted by C{s), where 



C{s) = minC^(5), 0<s<s. 



(8) 



A. Characterizations of the Value Function 

We first provide several characterizations for the value 
function defined in ^ and establish specific properties that 
are useful in characterization of the optimal policy. 

Let J^{s^w) be the expected long-term discounted cost 
under policy /i conditioned on the first jump arriving at time 
ti = 0, and being of size w. Here, s is the state of the system 
before executing the action dictated by the policy. By the 
memoryless property of the Poisson process, we have 



J^{s,w) g(w - ii{s,w)) 

00 

+^\Y^e-''''g{Wu-^i{s, 



k=l 



,Wk)) So = s - IJ'{s,w) 



(9) 



We may relate J^(s, W) to the total expected cost C^(s) 
defined in (|6]l as follows: 

E[e-''*°J^(min{s + rto,s},l^)] , 



(10) 



where is an exponential random variable with mean l/Q, 
and is independent of W, drawn from distribution fw- 

From ([To]), it is clear that from the minimization of 
across all admissible policies 11, we may obtain the optimal 
solution to the original problem in ([8]). The discrete- time for- 
mulation of given by ([9]), facilitates deriving the Bellman 
equation as the necessary and sufficient optimality condition, 
as well as development of efficient numerical methods. We 
summarize these results in the following theorem. 

Theorem 1. Given an admissible control policy /i G 11, let 
Jfi : [0, 5] X [0, 5] \-^R be the function defined as in ([9|. A 
function J : [0,5] x [0,5] satisfies 

J{s,w) — J*(s,i(;) = min J^{s,w), V(5,k;), 
if and only if it satisfies the following fixed-point equation: 

J{s,w) = {TJ){s,w) = 



mm <j g{w — u) 

[0,min{s,t(;}] 



E 



e"^^o J( min{s -u^ rto, 5}, W) 



, (11) 



Moreover, a stationary policy ii*{s^w) is optimal if and only 
if u = /i*(s,K;) achieves the minimum in ([77]) for J = J*. 
Finally, the value iteration algorithm 



Ji 



/c+l 



(12) 



converges to J* for any initial condition Jq. 



Proof: The result follows from establishing the contrac- 
tion property of T, which is standard for discounted problems 
with bounded stage cost. See |6| for more details. ■ 
An alternative approach to characterization of the optimal 
cost function is based on continuous -time analysis of problem 
([8]), which leads to Hamilton- Jacobi-Bellman (HJB) equation. 
In the following theorem we present some basic properties 
of the optimal cost function as well as the HJB equation. 

Theorem 2. Let C{s) be the optimal cost function defined 
in ([§]). The following statements hold: 



(i) 
(ii) 

(iii) 

dC 
ds 



C{s) is strictly decreasing in s. 

If the stage cost g{-) is convex, the optimal cost 
function C{s) is also convex in s. 

If C is continuously differentiable, then for all s G 
[0,s], it satisfies the following HJB equation 



C{s) 



Q 



E 



min g(W 

txG[0,min{s,W^}] 



u) + C{s - u) 



(13) 



= 0. 



(14) 



with the boundary condition 
dC 

ds s- 

Moreover, the optimal policy achieves the optimal 

solution of the minimization problem in (T3\ . Furthermore, 
for a given policy fi, if the cost function C^{s) is differen- 
tiable, it satisfies the following delay differential equation 



ds 



Q + e 



Q 



E 



'g{W - ti{s,W)) + C^{s - fi{s,W)) 



with the boundary condition given by ([74]). 



,(15) 



Proof: See the Appendix. ■ 
The result of Theorem [2] part (iii) requires continuous 
differentiabiUty of the optimal cost function, which can be 
established under some mild conditions such as differen- 
tiability of the stage cost function g and the probability 
density function fw{-) of Poisson jumps (cf. Benveniste and 
Scheinkman |7|). Throughout this paper, we assume that 
C{s) is in fact continuously differentiable and the results 
of Theorem [2] are applicable. 

B. Characterizations of the Optimal Policy 

In this subsection, we derive some structural properties of 
the optimal policy using the optimal cost characterizations 
given in Theorems [T] and [2] First, we show that the myopic 
policy of allocating reserve energy from storage to cover as 
much of every shock as possible is optimal for linear stage 
cost functions. Then, we partially characterize the structure 
of optimal policy for strictly convex stage cost functions. 

Theorem 3. If the stage cost is linear, i.e., g{x) = Px for 
some P > 0, then the myopic policy 



jii*{s^w) = min{s,K;}, 



(16) 



is optimal for problem ([5]). 



Proof: See the Appendix. ■ 
Next, we focus on nonlinear but convex stage cost func- 
tions. In this case, the myopic policy defined in ([16]) is 
no longer optimal. Intuitively, the myopic policy greedily 
consumes the reserve and thereby increases the chance of 
a large blackout. In the linear stage cost case, the penalty 
for a large blackout is equivalent to the total penalty of 
many small blackouts. This is contrary to the strictly convex 



case. Therefore, the optimal policy in this case tends to be 
more conservative in consuming the reserve. Nevertheless, 
the structure of the optimal policy shows some similarities 
with the myopic policy. In the following we present some 
characterizations of the structural properties of the optimal 



policy using the results from Section III-A 



Assumption 4. The storage process has a positive drift in 
the sense that the rate of the compound Poisson process is 
less than the ramp constraint, i.e., 

QE[W] < r. 

Theorem 4. Let ii*{s.,w) be the optimal policy associated 
with problem ([S]). If As sumption^ holds, then iJ*{s.^w) is 
monotonically nondecreasing in both s and w. 

Proof: See the Appendix. ■ 

Theorem 5. Let /i* denote the optimal policy associated with 
problem ^ with strictly convex stage cost g{-). There exist 
a unique kernel function (j) : [—5, ^ R such that 

/i*(5,^)= V(s,^) G [0,s]x[0,5], (17) 

where, 

(t){p) = argmin g{x)^C{x^p) (ig) 
s.t. X <imii{B^s — p} 
X > max {0, —p} 

Moreover, under Assumption [?] we can represent the kernel 
function (j){p) as follows: 



-p, -B <p <bo 
0, bi<p<s. 



(19) 



where (j)°{p) is the unique solution of 

9'{x) + C'{x + p)=0, (20) 
and bo and bi are the break-points, where 

>-^g')i-i)(^9.E[g{W)])>-B, (21) 
5i = -(C")(-^)(<7'(0)) <s. (22) 



Proof: See Appendix. ■ 
Theorem [5] demonstrates a very special structure for the 
optimal policy. In fact, it shows that the two dimensional 
policy can be represented using a single dimensional kernel 
function. This result allows us to significantly reduce the 
computational complexity of numerical methods for comput- 
ing the optimal policy. In addition, using Theorem [5] we can 
provide a qualitative picture of the structure of the optimal 
policy. Figures [3] and [4] illustrate a conceptual plot of the 
kernel function, and the optimal policy, respectively. 



Fig. 3. Structure of the kernel function (f){p) defined in fTs) . 
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Fig. 4. Structure of the optimal policy i2*(s,w) for a convex stage cost, 
for w = wi,W2. 

In particular, we can summarize the characterization of the 
optimal policy as follows. If > —60, we have 

s, < 5 < so{w) 

/i*(5,K;)=<^ w — (j)°{s — w)^ so{w) < s < si{w) 

si{w) < s < s, 

(23) 

where Si{w) = w -\- bi for z = 0, 1. In the case where w < 
—bo, we have 

[ 0, 0<s<qo{w) 
jj.*{s^w) = < w — — w)^ Qo{u)) < s < Si{w) 
I si{w) < s < s, 

(24) 

where qo{w) is the unique solution of (j)°{s — w) = w. 

IV. Numerical Simulations 

In this part, we present numerical characterizations of 
the optimal cost function and optimal policy in different 
scenarios. Moreover, we study the effect of storage size and 
volatility on system performance, for various control policies. 



We use the value iteration algorithm (12) to compute 
the optimal policy and cost function for nonlinear stage 
costs. Figures [5] and |6] illustrate the optimal policy and cost 
function in a scenario with uniformly distributed random 
jumps, quadratic stage cost, and the following parameters: 
= 0.1, r = 1,(5 = 0.8,5 = 2. Observe that the optimal 
policy complies with the conceptual Figure |4] 

Figure [7] shows the value of storage, defined as the nor- 
malized improvement of energy storage in expected cost, for 
different Poisson arrival rates. In this case 6 = 0.01, ^(x) = 
x^,r = l^W = 1. Note that the storage process has a 
negative drift if and only if Q > 1. Observe that in the 
positive or zero drift cases, even a small value of storage 
yields a significant effect in reducing the blackout cost. 
However, in the negative drift case, the value of storage is 
significantly lower. Observe that for the negative drift case. 




Fig. 5. Optimal policy computed by value iteration algorithm fT2) for 
quadratic stage cost and uniform shock distribution. 




Fig. 6. Optimal cost function computed by value iteration algorithm fT2) 
for quadratic stage cost and uniform shock distribution. 



there is a critical storage size that yields a sharp improvement 
in the value of storage. 

A. Blackout Statistics 



We discussed in Section [III-B| that the myopic policy given 
by ( p^ is not necessary optimal for nonlinear stage cost 
functions. In this part, we study the effect of different optimal 
policies, in the sense of ([7]), for different stage costs on the 
distribution of large blackouts. Figure [8] shows the blackout 
distribution in a scenario with deterministic jumps of size 
one, for both myopic policy and the optimal policy for a 
cubic cost function. Note that, the stage cost for the non- 
myopic policy assigns a significantly higher weight to larger 
blackouts. Therefore, as we can see in Figure [8] the non- 
myopic policy results in less frequent large blackouts at the 
price of more frequent small blackouts. 

Next, we study the effect of storage size on probability 
of large blackouts. Figure [9] plots this metric for different 
policies that are all optimal for different stage cost functions. 
Similarly to Figure |7] we observe a sharp improvement of 
the reliability metric at a critical storage size. It is worth 
mentioning that given a target reliability metric, the storage 
size required by the optimal policy with cubic stage cost is 
about half of what is required by the myopic policy. 

Finally, we compare the reliability of myopic and non- 
myopic policies in terms of probability of large blackouts as 
a function of the volatility of the demand/supply process. We 
define volatility as the energy of the shock process, i.e., 

volatility = QE[V^% 

which depends both on the mean and variance of the com- 



pound arrival process. Figure 10 demonstrates large blackout 





Fig. 7. Value of energy storage as a function of the storage capacity for 
different Poisson arrival rates. c(s]s) denotes the optimal cost function |8j 
when the storage capacity is given by s. 



I Optimal policy for qubic stage cost 
I Myopic policy (optimal for linear) 



Fig. 8. Blackout distribution comparison of myopic and non-myopic 
policies (deterministic jumps with rate Q = 0.8). 

probabilities as a function of volatility, for a system with 
uniformly distributed jumps with constant mean i?E[iy] = 1. 



As shown in Figure 10 higher volatility increases the prob- 
ability of large blackouts in an almost linear fashion. 



V. CONCLUSIONS 

We examined the reliability value of storage in a power 
supply network with uncertainty in supply/demand and up- 
ward ramp constraints on both supply and storage. The uncer- 
tainty was modeled as a compound poisson arrival of energy 
deficit shocks. We formulated the problem of optimal control 
of storage for maximizing system reliability as minimization 
over all stationary Markovian control policies, of the infinite 
horizon expected discounted cost of blackouts. We showed 
that for a linear stage cost, a myopic policy which uses 
storage to compensate for all shocks regardless of their size is 
optimal. However, for strictly convex stage costs the myopic 
policy is not optimal. Our results suggest that for high ratios 
of the average rate of shock size to storage ramp rate, there 
is a critical level of storage size above which, the value 
of additional capacity quickly diminishes. For ratios around 
three and above, there seems to be another critical level below 
which, storage capacity provides very little value. Finally, 
Our results suggest that for all control policies, there seems to 
be a critical level of storage size, above which the probability 
of suffering large blackouts diminishes quickly. 



Fig. 9. Probability of large blackouts as a function of storage size for 
different policies (deterministic jumps with rate Q = 1.0). 




Fig. 10. Probability of large blackouts vs. volatility for different policies 
(uniformly distributed random jumps with Q = 1.0 and E[W] = 1). 

Appendix 

Proof of Theorem [2j Part ( i): The mono tonicity property 
of the value function follows almost immediately from the 
definition. Let < si < S2 < s, and assume C{s) = C^{s) 
for some policy /i. Given the initial state si, let u^^"^ be the 
control process under policy /i. Note that for every realization 
uj of the compound Poisson process, the sample path (uj) 
is admissible for initial condition S2 > Si. Therefore, by 
definitions ^ and ([s]), we have C{s2) < C{si). 

In order to show the strict monotonicity, consider the 
controlled process starting from si. Let r be the first arrival 
time such that g{Wr — u^"^) > 0. By Assumption [s] we have 
P(r G [0, T]) > for some T < oo. For every sample path 
cj, define the control process 



=r(a;)}, 



for some S > such that S < min{52 — 5i, ^7.(0;) ~ ^wL)}- 
(2) 

It is clear that u). \uj) is admissible for the controlled 
process starting from 52. Using the definition of the expected 
cost function in ([6l), we can write 



C{S,)-C{S2) 



r(u;) 



E[ee"^^(^)], for some e > 
ee-^^P(r G [0,T]) > 0, 



> 
> 



where the first inequality holds by strict monotonicity of g. 

Part (ii): We first prove convexity of J*(5,k;) defined in 
Theorem [1} and use it to establish convexity of C{s). 

In order to show convexity of J* (5, w), we need to show 
that the operator T defined in ( pT) preserves convexity. Then 



the claim would be immediate using the convergence of value 
iteration algorithm (12) to optimal cost J*, where the initial 
condition is an arbitrary convex function such as Jq = 0. 

Next we show that the operator T preserves convexity for 
this particular problem. Define the objective function in ( pJ^ 
as Q{s^w^u). We have 

Q{s,w,u)=g{w - ^x) + E e"^^° J(min{5 -u^ rto, s}, W) 



-g{w -u)^ 



2) If (j){pi) = for some pi, then 

(j){p) = 0, for all p> pi. 

Proof: By convexity of the stage cost function and 
Theorem [Ij^ii), is the optimal solution of a convex 

program. Therefore, if 0(po) = —Po for some po < 0, we 
have 

g'{-Po) + C'{0)>0. 

Thus, by convexity of stage cost, g{—p) > g{—po), for any 
P ^ Po- Therefore, by convexity of C(-) and 



/ ' e '''E[j{s-u^rto,W)]Re "^'^dt^. ^'(x) + + p) > ^'(-p) + C'(0) > 0, for all x > -p, 

J 



Using the fact that J is convex, linearity of expectation and 
basic definition of a convex function, it is straightforward 
but tedious to show that Q{s^w^u) is a convex function. 
We omit the details for brevity. Given the convexity of Q, 
the convexity of {TJ){s^w) is immediate, since we are 
minimizing a multidimensional convex function over one 
of its dimensions. Hence, we have established convexity of 
J*(s, w) in (5, w). Finally, we can express C{s) in terms of 
J*(5,k;) as in ([To]). This results in convexity of C{s) using 
the above argument for proving convexity of Q{s^w^u). 

Part (in): The derivation of Hamilton- Jacobi-Bellman is 
relatively standard. We omit the proof for brevity, and present 
a proof sketch based on principle of optimality in O. For a 
more detailed treatment, please refer to fSl, |[9l and ifTOl . 

Proof of Theorem |3] We establish optimality of //* by 
showing that it achieves an expected cost no higher than any 
other admissible policy. Consider an admissible policy ji such 
that fl{s^w) < mm{s,w} for some (s^w) G [0, s] x [0,5]. 
For every sample path of the controlled process, let ri(cj) be 
the first Poisson arrival time such that 

min{s^- ,Wr,}- fl{s^- , VI^^J = e > 0. 

Therefore, by applying policy fl instead of /i*, we pay 
an extra penalty of I3ee~^^^^^\ The reward for this extra 
penalty is that the state process is now biased by at most 
e, which allows us to avoid later penalties. However, since 
the stage cost is linear, the penalty reduction by this bias for 
any time r2{uj) > ti(cj) is at most I3ee~^^^^^\ Hence, for 
this sample path uj, the policy /i does worse than the myopic 
policy /i* at least by pe{e-^^^^^^ -e'^^^^^^) > 0. Therefore, 
by taking the expectation for all sample paths, the myopic 
policy cannot do worse than any other admissible policy. 
Note that this argument does not prove the uniqueness of 
/i* as the optimal policy. In fact, we may construct optimal 
policies that are different from /i* on a set A C [0, s] x [0, 5], 
where P((5^- , Wt) e A) = 0. ■ 

We delay the proof of Theorem |4] until after proof of 
Theorem \5\ Let us start with some useful lemmas on the 
structure of the kernel function. 



Lemma 1. Let <p{p) be defined as in ([78|. We have 
1) If (j){po) = —po for some po, then 



which immediately implies optimality of (— p), for p < po. 

Similarly, for the case where = 0, we have ^'(0) + 

C'{pi) > 0, which implies 

g\x) + C\x^p) > g\0) + C\p) > 0, for all p > pi, 
hence, the objective is nondecreasing for all feasible x and 

Hp) = 0. ■ 

Lemma 2. Let C{s) be defined as in and assume that 
the stage cost g{-) is convex. Then 



dC 
ds 



(s) > -^^wigiw)], 0<s<s. 



(25) 



Proof: By Theorem[2jii), the optimal cost function C(s) 
is convex. Hence, ^(s) > ^(0). On the other hand, by 
Theorem |2|iii), we can write 



dC 



(0) 



-C(0) 



Q 



E 



w 



mmg{W -0) + C{0) 

. u=0 



ds ' r ' r 

Combining the two preceding relations proves the claim. ■ 

Lemma 3. If Assumption |?| holds, then the first constraint 
in USv is never active, i.e., (j){p) < min{5, s — p}. 



Proof: We show that under Assumption [4j the slope 
of the objective function is always non-negative at x = 
min{5, s — p}. In the case where s — p < B, have 



d_ 
dx 



y{s-p)^C\s)>0, 



Hp) 



- for all p < Pq. 



where the inequality follows from monotonicity of g and 
([14]). For the case where s — p > B, we employ Lemma [2] 
and Assumption [4] to write 

^ [g{x) + C{x + p)) = g'{B) + C'{B+ p) 

>g'{B)-9.^MW)]>g'iB)-^^^^>0, 

where the last inequality holds because g{w) < wg'{B), for 
diW w < B, which is a convexity result. ■ 

Proof of Theorem [5j By Theorem [2jiii), we can charac- 
terize the optimal policy as 

w) — argmin g(w — u) -\- C{s — u) (26) 
s.t. < u < min{5, w}. 



Note that the optimization problem in ([26]) is convex, 
because g{-) and hence, C(-) is convex (cf. Theorem |2jii)). 
Using the change of variables 

X = w — p = s — 

we can rewrite ([26]) as w) = w — w), where 

w) = argmin g{x) -\- C{p -\- x) (27) 
s.t. X > max{0, —p} 
X < w. 



The optimization problem in ( [27] ) depends on both pa- 
rameters p and w. We may remove the dependency on w 
as follows. Since w < 5, s — p, we may relax the last 
constraint, x < w,hy replacing it with x < min{5, s — p} 
The optimal solution of the relaxed problem is the same 
as (j){p) defined in (18). If (j){p) < w, then the relaxed 



constraint is not active, and (j){p) is also the solution of (27). 
Otherwise, since we have a convex problem, the constraint 
X < w must be active, which uniquely identifies the optimal 
solution as w. Therefore, the optimal solution of the problem 
in (27 ) is given by w) = min{^(p), w}. Combining the 
preceding relations, we obtain 



w) = w — mm{(j){s — w)^ w} 



w 



w) 



The representation in ( [T9] ) is a direct consequence of 
Lemmas [T] and [3] Between some break-points bo and bi, 
the optimal solution of (18) can only be an interior solution, 
which is given by ( [20] ). The uniqueness of ^° {p) follows from 
strict convexity of g. Finally, by continuous differentiability 



of the cost function, equation ( [20] ) should hold at the break- 
points as well. Therefore, 

g'ibo) + C'ibo + i-bo)) = 0, g'{0) + C'(0 + 61) = 0, 

which is equivalent to the characterizations in ( [2T] ) and ( [22] ). 
The first inequality in ( [2T] ) holds by Lemma [25] and convexity 
of ^(•), and the second inequality holds by Assumption [5] and 
applying convexity of g{-) again. ■ 



Lemma 4. Let (j){p) be defined as in ([7^, and assume that 
Assumption^holds and the stage cost g{-) is strictly convex. 
Then for all pi < p2, 



{P2 -Pl) < ^(P2) - ^(Pl) < 0. 



(28) 



Proof: We first establish the monotonicity of Let 
Pl < P2- Given the structure of the kernel function in ( [T9] ), 
there are multiple cases to consider, for most of which the 
claim is immediate using ( [T9] ). We only present the case 
where —B < pi < bi and bo < P2 < 6i. A necessary 
optimality condition at pi is given by 

g\^{pi))^C\pi^^{pi))>0. (29) 

Similarly, for p2, we must have 

g\^{P2))^C\p2^^{P2))=0, (30) 



Now, assume ^(^2) > ^(Pi)- By convexity of C(-) (cf. 
Theorem [2];ii)) and strict convexity of g{-), we obtain 

9'{HP2))+C'{p2+HP2)) > 9'{Hpi))+C'{pi+Hpi)) > 0, 

which is a contradiction to ( [30] ). 

For the second part of the claim, again, we should consider 
several cases depending on the interval to which pi and p2 
belong. Here, we present the case where bo < Pi < ^2 and 
bo ^ P2 ^ s. The remaining cases are straightforward using 
( [T9] ). In this case, we have 

9'{4>{pi)) + C'{pi+4>{pi))=0, (31) 

g'{4>{P2)) + C'{p2 + 4>{P2))>0. (32) 

Combine the optimality conditions in ( |3T] > and p2| l to get 

g\cl^{P2)) + C\p2 + (/>(P2)) > g\cl^{Pi)) + C\pi + (/>(pi)) 

(33) 

Assume (/)(p2) < </>(pi); otherwise, the claim is trivial. 
By strict convexity of g{-), we have g'{(t){p2)) < g'{4^{Pi))- 
Therefore by ( [33] ), it is true that 

C\P2 + ^{P2)) > C\pi + (34) 

Now assume 0(^2 ) — ^(Pi) < —{P2 — Pi)- By rearranging 
the terms of this inequality and invoking the convexity of 
C(-), we get C'{p2 + 0(^2)) < C'{pi + which is in 

contradiction to (1341). Therefore, the claim holds. ■ 



Proof of Theorem [4] First, note that by Lemma [4] we get 

(j){s2 — w) < (j){si — w)^ for all Si < S2 
which implies (cf . Theorem [5]) 

11*{S2,W)= [^-^(52-^)]+ > [w-(j){Si-w)Y'= 11*{SI,W). 

Moreover, for all s and wi < W2, we can use the second 
part of Lemma [4] to conclude 

(/)(5 - Wi) - (j){s - W2) > -{W2 - Wi). 

By rearranging the terms, it follows that 

/i*(5, ^2)= [w2-(t){s-W2)\^> [wi-(t){s-Wi)\^= ll'{s, Wi), 

which completes the proof. ■ 
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